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This paper considers the possibility that the daily average Particulate Matter 
(PMiq) concentration is a seasonal fractionally integrated process with time-dependent 
variance (volatility). In this context, one convenient extension is to consider the 
S ARFIMA model (Reisen ct al., 2006a,b) with GARCH type innovations. The model 
is theoretically justified and its usefulness is corroborated with the application to 
PMio concentration in the city of Cariacica-ES (Brazil). The model adjusted was 
able to capture the dynamics in the series. The out-of-sample forecast intervals were 
improved by considering heteroscedastic errors and they were able to identify the pe- 
riods of more volatility. 

Keywords: Fractional differencing, Long-memory, ARFIMA, Seasonality, Heterosce- 
dasticity, PMiq contaminant. 

1 Introduction 

The issue of airborne ambient Particulate Matter (PM) has become a well-recognized 
research topic in environmental sciences. Epidemiological studies have reported strong 
associations between PM\q concentrations (PM with an aerodynamic diameter of less 
than or equal to 10 fim) and several adverse health effects, including respiratory problems 
in children, death and increased hospital admissions for cardiopulmonary and respiratory 
conditions see, for example, Touloumi et al. (2004), Perez et al. (2007), Zelm et al. (2008) 
and references therein. 

In the literature, several modeling strategies have been developed or optimized for the 
study and forecast of PM concentration in urban areas, such as Diaz Robles et al. (2008), 
Konovalov et al. (2009) and others. Among these modeling efforts, statistical models 
based on multiple regression (Stadlober et al., 2008) and time series tools, such as the 
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Box-Jenkins time series Autoregressive Integrated Moving Average (ARIMA) model, have 
been widely used for this class of problems (Goyal et al., 2006; Liu, 2009). 

Models which adequately describe the physical behavior of the data are essential for 
accurately forecasting in any area of application. In this paper, a Seasonal Autoregressive 
Fractionally Integrated Moving Average (SARFIMA) model with more than one fractional 
parameter and a non-constant conditional error variance (heteroscedastic errors) is used 
to illustrate how it can be useful to fit and forecast series with seasonality, volatility 
and long-range dependency (or long- memory) features. These time series phenomena 
are quite common characteristics found in data in many areas of interest. For example, 
Windsor & Toumi (2001) analyzed the variability of the pollutants ozone and PM with 
long-memory technique which was also the methodology applied by Baillie et al. (1996) 
to model and forecast temperature series. Karlaftis & Vlahogianni (2009) studied the 
memory and volatility properties in transportation time series Kumar & Ridder (2010) 
focused on modeling and forecasting ozone episodes through heteroscedastic processes 
(GARCH) associated with ARIMA model. 

Roughly speaking, seasonality is a phenomenon where the observation in the instant, 
say t, is highly correlated with the one in the time t — s. In this case, s is called season 
length. It is important to consider statistical tools which take into account the seasonality 
effect. However, some studies focusing on the forecast of daily PM\q concentrations do 
not regard for the seasonal influence of weather patterns (Goyal et al., 2006). Other 
studies, such as Stadlober et al. (2008) try to control the seasonal component by using 
dummy variables which is suitable just in the case when seasonality is present in the mean 
structure only. 

Time series with volatility is characterized by a non-constant conditional variance, 
i.e., the error variance changes as a function of time. This fact contrasts with the usual 
assumption, namely the variance of the process is assumed to be constant. However, if 
the variance is time-varying, the forecast variance can be reduced by accommodating the 
conditional variance which will lead to more accurate forecast confidence intervals. A sys- 
tematic structure for modeling volatility in a time series is the Autoregressive Conditional 
Heteroscedastic (ARCH) model proposed by Engle (1982). An extension of this model, 
the Generalized Autoregressive Conditional Heteroscedastic (GARCH), was proposed by 
Bollcrslcv (1986). See also Bollerslev et al. (1992) for a more complete review on this 
subject. Due to the high temporal variability of PM\q concentration, it is usually found 
to have a time-varying conditional variance (see Chelani & Devotta (2005) among oth- 
ers). Volatility models are popular tools in financial literature, however, only recently, 
these have caught the attention of many researchers interested in modeling time- varying 
variance in time series of environmental sciences, e. g. McAleer &; Chan F. (2006). 

Recently, time series analysis with long-term dependency have been studied by several 
authors in different areas of applications. In the time-domain, long-range dependency is 
usually characterized by a significant autocorrelation even for those observations separated 
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by a relatively long time period. The ARFIMA model (Granger & Joyeux, 1980; Hosk- 
ing, 1981) is a time series model that well accommodates the long-memory feature. As 
discussed in the next section, this model has the parameter d, which governs the memory 
of the process. Several estimation methods for the long- memory parameter have been 
proposed. The most popular semiparametric estimator is due to Geweke & Porter Hudak 
(1983), Reisen (1994) among others. The usefulness of modeling time series with the long- 
memory characteristic by ARFIMA processes has been extensively studied, theoretically 
and empirically, in many areas, such as mathematics, economics among others. For a 
recent review of this subject, see Palma (2007). The characteristics of the long-memory 
parameter estimators have been extensively investigated under various model situations, 
such as the presence of non-Gaussian errors and outliers, e.g. Sena Jr. et al. (2006), 
Fajardo M. et al. (2009) among others. 

However, in environmental science, more specifically, in the air pollution area, the 
use of the ARFIMA model has still not been well explored. Nowadays, there is a lot 
of software that makes using this model less difficult in applied works. So, due to the 
important model features of the ARFIMA process, this model is certain to motivate much 
research in the near future in the environmental science area. Iglesias et al. (2006) is an 
example of applied work with long-memory process in the air pollution area. The authors 
have investigated the use of an ARFIMA model to handle time series of PM2.5, PM\q 
concentrations and other gaseous pollutants. 

A natural extension of the ARFIMA model to accommodate seasonal features is the 
seasonal ARFIMA model. Since the early 90's, this model has caught the attention of re- 
searchers that are interested in studying long-memory time series with seasonal fractional 
parameters. Porter Hudak (1990) among others proposed the use of Geweke & Porter Hu- 
dak (1983) method for the estimation of seasonal ARFIMA processes. A generalization of 
these seasonal long-memory models are the ARUMA and the GARMA models, which were 
originally proposed by Giraitis & Leipus (1995) and Woodward et al. (1998), respectively. 
Reisen et al. (2006a, b) presented studies regarding the seasonal ARFIMA model, which is 
a particular case of the ARUMA/GARMA models, and suggested long-memory estima- 
tors. Empirical studies, performed by the authors, indicate the efficiency of the estimators 
when compared to other existing methods. Seasonality and long- memory properties have 
been explored theoretically and empirically by a large number of works, see for example, 
Reisen et al. (2006a, b), Arteche &: Robinson (2000) among others. 

For a series that presents seasonal long-memory features with conditional variance (or 
volatility), one convenient extension is to consider the SARFIMA model with GARCH 
type innovations. This model can provide a useful way of analyzing a process exhibiting 
seasonal long-memory with volatility. This is the main purpose of this paper, which 
proposes the use of a SARFIMA model with one non-seasonal and one seasonal fractional 
parameter and GARCH errors. The model is theoretically justified and its usefulness is 
corroborated with the application to PM\q ambient concentrations. 
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The rest of this paper is organized as follows. Section 2 introduces the model and 
discusses its properties. The section also summarizes the estimation method of the pa- 
rameters. Section 3 deals with the analysis and modeling of the PM\q contaminant and 
forecasting issues. Some conclusions are draft in Section 4. 



2 The model and parameter estimation 

A process X t = {X t }tez is defined as a zero-mean SARFIMA(p, d, q) x (P,D,Q) S 
model with non-seasonal orders p and q, seasonal orders P and Q, difference parameters 
d and D, and season length s £ W = IN — {0} if 

u t = V d X t (1) 

is a SARMA (p, q) x (P,Q) S process. That is, the process {Ut}t<=z satisfies 

${B a )<l>{B)Ut = &(B s )6(B)e t , (2) 

where {etjtgz is a white noise with E(ej) = and Var(e^) = a\ and B is the backward 
operator satisfying BY t = Y t -\ for any process {Y t }t<=z- 
In (1), the operator V d is defined by: 

V d = (1 - B) d (\ — B S ) D , (3) 

where d = (d, D) G M? is the memory vector parameter, d and D are the fractionally 
parameters at the zero (or long-run) and seasonal frequencies, respectively. Also, the 
fractional filters are 
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and r(-) is the well-known gamma function. 

In (2), the polynomials $(•), 4>{') an d O(-) are given by 

$(Z S ) = 1 - - $ 2 Z 2S $ P / S 

e(z s ) = i - e lZ s - e 2 z 2s @ Q z Qs 
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It is assumed that these polynomials have no common zeros and satisfy the conditions 
&(z s )(j)(z) ^ and ®(z s )9(z) ^ for \z\ = 1. Futhermore, in the above equations, 
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(,*&i)l<i<p, (®i)l<j<Qj (<^fc)i<fc<p an d (^)i<£<5 are unknown parameters. For more details, 
see, for example, Palma & Chan (2005), Giraitis & Leipus (1995) among others. If |d+-D| < 
1/2 and \D\ < 1/2, Xt is a stationary and invertible process and, at seasonal frequency 
uj s £ I -71 ") 71 "] j the spectral density becomes unbounded and behaves as 



f(uj + L0 S ) ~ C\SL0\ 



-2D 



2 sin ■ 



-2d 



UJ 



0, 



(4) 



where C is a non- negative constant. 

Granger & Joyeux (1980) and Hosking (1981) proposed an ARFIMA(p, d, q) model, 
which is a particular case of the SARFIMA model (Eq. (1) and (2)) when P = Q = D = 0. 
The ARFIMA models are commonly used to model time series with long-memory behavior 
and have the following characteristics; the ARFIMA process is stationary and invertible, 
when \d\ < 0.5; d > characterizes a long-memory dependence; d = and d < indicate 
that the process has a short and an intermediate dependence, respectively. The spectral 
density function of the ARFIMA model has the form f{w) ~ C\w\~ 2d for w — > 0, where 
C is a non-negative constant. The correlation between Xt and X t +k satisfies p(k) ~ k 2 ^ 1 
as k — > oo. To estimate d, in the context of semiparametric frameworks, the method 
proposed by Geweke & Porter Hudak (1983) (GPH) was the pioneering one and it has 
been widely used in the literature. Based on the GPH method, other variant estimators for 
d were proposed, for example, Reisen (1994), Arteche & Robinson (2000) and Reisen et al. 
(2010). Here, the GPH method is the basis of the fractional seasonal and non-seasonal 
parameter estimation tool. 

Let {Xi, . . . , X n } be a sample from the process Xt (Eq. (1)). Reisen et al. (2006a, b) 
suggested a slight modification of Geweke &: Porter Hudak (1983) method to estimate 
the parameters d and D, in a seasonal ARFIMA process (Eq. (1)). For a set of Fourier 



frequencies Uj = 1 < j < M = [ Z ], where [^J is the greatest integer small than 
or equal to x, the estimation method consists in obtaining the estimator d 
the approximated multiple linear regression equation 
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Under some model conditions, Reisen et al. (2010) establish that 
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where b and W are a vector and a matrix 2 x 2 of constants, respectively, and M is the 
bandwidth in Equation 5 that satisfies 

fM\\ 1 

— log M + — ->■ , as n — >■ oo , 
\ n J M 

for some i > 0. 

The high variability of the data suggests that the PM\q has a time varying conditional 
variance (Chelani &; Devotta, 2005). Thus, it may be interesting and useful to model PMiq 
with a statistical tool that incorporates the features seasonality, long-memory and heteros- 
cedasticity. Thus, the SARFIMA process defined in Eq. (1) and (2), with heteroscedastic 
errors, is the model candidate to adjust and forecast daily average concentrations of PMiq. 

Due to the extensive literature on the application of ARFIMA and GARCH to model 
time series with long-memory and heteroscedasticity features, the ARFIMA process with 
GARCH innovations becomes a very popular tool in practical data analysis. This model 
was the main motivation of the work Ling & Li (1997). The authors introduced the 
ARFIMA(p, d, g)-GARCH(r, to) model, where p, q, r, m G IN* and d G R, and presented 
model and maximum likelihood estimator properties. Independently, Sena Jr. et al. 
(2006) investigated empirically the ARFIMA (p, d, g)-GARCH(r, m) model with parametric 
and semiparametric estimation procedures to estimate the parameters of the ARFIMA 
part. Baillie et al. (1996) analyzed inflation series from 10 countries with ARFIMA- 
GARCH methodology. They suggested a procedure to obtain approximate maximum 
likelihood estimates of an ARFIMA-GARCH model. These works give strong support to 
use the ARFIMA model in a practical application even in the case where the errors have 
heteroscedastic properties. Then, based on this discussion, the seasonal model defined in 
Eq. (1) and (2) can be extended to a seasonal model with heteroscedastic errors such 
as GARCH(r, to) process. The model that incorporates these characteristics is defined 
hereafter as SARFIMA(p, d, q) x (P, D, Q) s -GARCH(r, m), where now {e t } in Eq. (2) has 
the following structure 

m r 

e t \%-i ~ -D(0, h t ), h t = a + '^2a i £t_ i + '^2f3 j h t _ j , (7) 

i=l j=i 

where to, r £ N* represent the model orders, ao > and j3j > 0, for i = 1,2, m and 
j = 1, 2, r, and 3^ denotes the a field generated by the past information {e t -i,e t -2, • • • }• 
In above, D is a probability distribution of a continuous random variable, for example, 
normal or t-student distribution. 

Combining the model properties in Reisen et al. (2006a, b) with Therorem 2.3 given 
in Ling &, Li (1997), the following proposition is established for the SARFIMA (p, d, q) x 
(P, D, Q) s -GARCH(r, to) model. 

Proposition 1. Let Xt be generated by Eq. (1) and (2) with et given by (7) where 
Y^=i a i + Sj=i Pj < 1- Suppose that the polynomials $>(z s )(ft(z) and (z s )6(z) in (2) have 



6 



no common zeros and that d in (3) satisfies: \d + D\ < 1/2 and \D\ < 1/2. Then, the 
following statements hold 

(a) If Q(z s )(fi(z) 7^ ; for \z\ = 1, then X t is second-order stationary and has the unique 
representation given by 

(8) 

where ipj are determined by the Laurent expansion 



oo 

y3 



in some annulus of\z\ = 1. Hence, Xt is strictly stationary and ergodic. 
(b) If @(z s )9(z) ^ 0, for \z\ < 1, then X t is invertible and 



,Mz s )^z 

3=0 

where if)* are given by 



oo 

^=^ + ^7rfV-, (9) 
i=l 



r(z-d) 

T, = r(i + i)rH' i-o,i,... 



4 sj = w, T fc = o, 1, . 



M _ T(k — D) 

T{k + l)T{-D) 

where T(.) is the Gamma function 



(c) The spectral density of {Xt} is given by 

fx{u) = fu(w) [2sin(^— J J [ 2sin (2)J , w€[-7r,7r], (10) 

where /[/(•) is i/ie spectral density of the stationary SARMA process {Ut} and Ut = 
V d X t . 

The proof of this proposition is given in the Appendix. 

Next section presents the analysis of daily average PM\q concentrations based on the 
SARFIMA(p, d, q) x (P, D, Q) s -GARCH(r, m) model previously introduced. 
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3 Analysis and results of modeling PMiq concentration 



As previously mentioned, the daily average PMio concentration is the data set here 
analyzed to illustrate the methodology previously discussed. The series is expressed in 
/ig/m 3 and it was observed in Cariacica, which belongs to the Metropolitan Region of 
Greater Vitoria (RGV)-ES- Brazil. RGV is comprised of five cities with a population of 
approximately 1.7 million inhabitants in an area of 1,437 km 2 . The region is situated in 
the South Atlantic coast of Brazil (latitude 20°19S, longitude 40°20W) and has a tropical 
humid climate, with average temperatures ranging between 24° C and 30° C. 

The raw series has a sample size of 1826 observations, measured from January 1st 
of 2005 to December 31st of 2009. The series has mean X = 43.81/ig/m 3 and it is 
shown graphically in Figure 1 . Maximum concentration is generally observed in the winter 
months from July to September and the data shows to be stationary in a mean-level with 
strong seasonality pattern as expected. In addition, there is considerable evidence that the 
conditional variance is not constant over time, so that conditional volatility models seem 
to be appropriate choice for capturing the time- varying volatility in the level of the PMio 
concentration. For modeling purpose, the time series is divided into two parts; learning 
and prediction sets. The 1603 observations from January 1st of 2005 until May 22nd of 
2009 are considered as learning set and the remaining 233 observations are considered for 
the prediction study (these observations are representing by a dashed line in Figure 1). 




2005 2006 2007 2008 

Time 



Figure 1: Daily PMi concentration in fig/m 3 from 01/01/2005 to 12/31/2009 

The sample autocorrelation (ACF) and partial autocorrelation (PACF) functions of 
PMio are shown in Figures 2(a) and 2(b), respectively. These plots clearly show the 



8 



presence of the seasonality behavior with period s = 7, which is an expected data behavior 
since the series was observed daily. The frequency domain counterpart of the sample ACF 
is the periodogram which is presented in Figure 2(c). The sample spectrum has peaks at 
frequencies very close to zero and also at frequencies which are multiples of 1/7. 



W, 



20 40 60 80 

Lag 

(a) ACF of PMio concentration 
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(c) Periodogram of PMio concentration 

Figure 2: ACF, PACF and periodogram of the PMio dataset. 

An interesting feature observed from the sample ACF is the positive, significant and 
slowly decaying of the sample autocorrelations in the first lags, at the lags multiple of 7 
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and in the lags between the seasonal periods. This may indicates long-memory effect in the 
data with positive fractional seasonal and non-seasonal parameters. The suspicion of this 
phenomenon in the data is also observed in the plot of the periodogram in which there are 
significant peaks at the long-run and at the seasonal periods. These plots corroborate the 
need for a model which adequately describes the seasonal and nonseasonal long-memory 
behaviors. However, it is not clear the existence of short-memory parameters by only 
examining these plots. 

The empirical evidence described in above motivates the use of the SARFIMA model 
defined previously. The SARFIMA modeling strategy follows the same steps suggested in 
Hosking (1981) and investigated empirically by Reisen (1994) and Reisen & Lopes (1999) 
among others. Firstly, the fractional parameters are estimated by using the semiparametric 
tool described in the previous section. This was carried out by using different sizes of the 
bandwidth M. To determine the bandwidth sizes, m = L r("-*)/2-n a j, o < a < i. 
Secondly, the truncated filter (1 — B) d (l — B S ) D is used to filter the observation and to 
obtain a new series which approximately follows a SMA(l) x (1)7 model. This new series 
is used to achieve the complete short-memory model structure. The estimating models 
and their accuracy are discussed in the next sub-sections. All estimates were computed 
using R programming language. 

3.1 Adjusted models 

Table 1 presents the results of the memory estimates obtained from different band- 
widths (M). The values in brackets correspond to the standard deviations. It can be 
seen that the estimates of the long-run component described by the fractional differenc- 
ing parameter d are stables across the bandwidth values. Large M gives less power for 
the seasonal frequencies than the smaller ones. The decreasing power of D with M may 
indicated that there are some contributions of seasonal short-memory counterpart in the 
model. Since the effect of the seasonal and non-seasonal short-run components can not 
be avoided in the fractional estimates, the regression equation should be estimated with 
fewer periodogram ordinates at the zero and at the seasonal frequencies. Thus, the frac- 
tional estimates were chosen for a = 0.78. Note that the stationary model conditions is 
guaranteed since < \d + D\ < 0.5. 

To obtain the approximation of Ut (Eq. 2), the observations were filtered by V d 
truncated at n = 1603. The new series is Ut = Yl^o^jiXt-j — X), where iplj, j = 
1,2, ... , 1603, are the estimated coefficients ip* obtained in accordance with (9) in Propo- 
sition 1. As an example to verify the impact of Xj, for large j, in the AR infinite repre- 
sentation, the ^1603 ^ s ~ 10~ 5 (^1603 = 0.0000 1 340581), which is nearly zero. Since the 
observations are in scale of 10 1 , the contribution of Xj becomes negligible for large j. 

Figures 3(a) and 3(b) present the sample autocorrelation and partial autocorrela- 
tion functions of Ut, respectively. These plots indicate that a Seasonal Moving- Average 
(SMA(l) x (1)7) model may be adequate to describe Uf This model order was corroborated 
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Table 1: Estimates of d and D for different bandwidths (M = n a ). 



a 



0.98 
0.96 
0.94 
0.92 
0.90 
0.88 
0.86 
0.84 
0.82 
0.80 
0.78 
0.76 



M 
~99~ 
87 
76 
66 
58 
51 
44 
39 
34 
29 
26 
22 



0.2791 
0.2714 
0.2623 
0.2639 
0.2645 
0.2496 
0.2570 
0.2676 
0.2707 
0.2634 
0.2606 
0.2641 



(sd(d)) 



D 



(0.0268) 
(0.0276) 
(0.0287) 
(0.0298) 
(0.0310) 
(0.0319) 
(0.0325) 
(0.0331) 
(0.0339) 
(0.0355) 
(0.0372) 
(0.0382) 



0.1219 
0.1123 
0.1157 
0.1187 
0.1282 
0.1423 
0.1581 
0.1728 
0.1704 
0.1923 
0.2223 
0.2550 



(sd(D)) 



(0.0292) 
(0.0307) 
(0.0331) 
(0.0355) 
(0.0383) 
(0.0410) 
(0.0438) 
(0.0463) 
(0.0496) 
(0.0547) 
(0.0596) 
(0.0647) 



by the AIC criterion and residual analysis. 
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(a) The ACF of U t 



U t = (1-B) a2606 (1 -B 7 ) a2223 (X,-X) 
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(b) The PACF of U t 



Figure 3: The ACF and PACF plots of U t . 

Therefore, the model SARFIMA(0, d, 1) x (0, D, 1)7 was chosen for the PM\q average 
data. The standard residual analysis did not present any anomaly of the residuals of this 
model, that is, most of the correlations of it falls inside the confidence boundaries. Then, 
the residuals themselves appear to be uncorrelated. These are not presented here to save 
space but are available upon request. However, the plot in Figure 4(a) clearly indicates 
that the variance of the errors is not constant. Furthermore, the Figures 4(b) and 4(c) 
are, respectively, the ACF and PACF of and they suggest that a generalized conditional 
heteroscedasticity (GARCH) model can be suitable to capture the time-varying volatility 
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in the data. 

In order to statistically verify the presence of heteroscedasticity in ef, the Lagrange 
multipliers test was performed (Lcc, 1991) and the null hypothesis of residual homecedas- 
ticity was rejected with p— value smaller than 0.001. After performing model adequacy, the 
model GARCH(1,1) was adjusted for the ef of the SARFIMA model. The final estimated 
model is a SARFIMA (0, d, 1) x (0, D, 1) 7 -GARCH(1, 1). The estimates of the parameters 
are displayed in Table 2. 
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(a) Squared residuals (volatility) of PMio concentration 
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Figure 4: Plots related to the volatility of PMio concentration 
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Table 2: SARFIMA-GARCH parameter estimates of PMiq concentration 



Parameter 


Estimate 


s.d. 


t-test 


p- value 


d 


0.2606 


0.0372 


7.0054 


< 0.0001 




0.2223 


0.0596 


3.7299 


0.0002 




0.1417 


0.0258 


5.4923 


< 0.0001 


e 


-0.1092 


0.0265 


-4.1208 


< 0.0001 


a 


1.6464 


0.5623 


2.9280 


0.0034 




0.0677 


0.0111 


6.0991 


< 0.0001 




0.9205 


0.0132 


69.735 


< 0.0001 
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Figure 5: Plots of the residuals of the adjusted GARCH (1,1) model 



The GARCH(1,1) model adequacy is now discussed. Figures 5(a) and 5(b) present 
the histogram and the ACF of the residuals of the adjusted GARCH model. As a first 
analysis, these figures apparently indicate that the residuals are non correlated and the 
histogram is slightly positively skewed. A detailed investigation is as follows. Statistical 
quantities of these residuals are given in Tables 3 and 4. These confirm that the residuals 
are uncorrelated and not normally distributed, which was an expected result since the 
original data is also right skewed. 

Table 3: Some statistics of the residuals of the adjusted volatility model 



Mean 


Stnd. dev. 


Skewness 


Kurtosis 


0.0128 


0.9994 


0.4277 


0.8718 



To end the model adequacy, Figure 6 presents visual analysis of the SARFIMA adjusted 
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Table 4: Tests for normality (* ) and non correlation (** ) 



Shapiro- Wilk* 


Jarque-Bera* 


Box-Pierce** 


Ljung-Box** 


< 0.0001 


< 0.0001 


0.1151* 


0.1138+ 



+These p-values correspond to the Box-Pierce and Ljung-Box test statistics with lag 8. 

model, that is, the one-step-ahead predicted values from year 2008 which indicates a 
reasonably good performance of the model here proposed. It can be seen that it was able 
to capture the tendency and seasonality of the series. 



Data 




Time 

Figure 6: PMio concentration and their predicted values from 01/01/2008 to 12/31/2009 

3.2 Forecasting issues 

This section examines the forecast performance of the model discussed in this paper 
with confidence intervals builded with homoscedastic and heteroscedastic variances. As 
stated before, the observations from may 23th of 2009 to december 31th of 2009 were 
discarded from the modeling step (223 observations) to be used for an out-of-sample 
one-step-ahead forecast study. To measure the accuracy of the forecasts, the criterions 
used were the Mean Percentage Error (MPE) and the Mean Absolute Percentage Error 
(MAPE). To quantify the performance of the forecast intervals, the values of the Coverage 
Percentage of GARCH and Homoscedastic Forecast Intervals, denoted by CPGFI and 
CPHFI, respectively, were calculated. These quantities are reported in Table 5. The MPE 
and MAPE criterions indicated that the SARFIMA model here proposed gave reasonably 



14 



accurate forecasts. Furthermore, the coverage percentage of the homoscedastic forecast 
interval CPFFI is much smaller than the confidence level of 95%. On the other hand, 
CPGFI is very close to the nominal confidence level, i.e., CPGFI = 94.17%. This suggest 
that the SARFIMA-GARCH model well accommodates the properties of the daily average 
PMio concentrations data set analyzed in this paper. 

Table 5: Forecast performance of the selected model 



Criterions 


MPE 


MAPE CPGFI 


CPHFI 


8.46% 


23.85% 94.17% 


91.03% 



Finally, Figure 7 displays the observations and the out-of-sample one-step-ahead 95% 
GARCH and homoscedastic asymptotic forecast intervals for the model proposed. This 
figure provides a visual comparison of the coverage of these intervals. From this graph, one 
can see that the GARCH forecast intervals are able to capture the high volatility periods. 
This explain the coverage percentages showed in Table 5. 




2010 

Time 



Figure 7: GARCH and homocedastic 95% forecasting intervals of the SARFIMA model of daily 
average PMi concentrations from 05/23/2009 to 12/31/2009 

4 Conclusions 

In this paper a seasonal ARFIMA model under heteroscedastic innovations is applied 
to model daily average PMio concentrations. To estimate the fractional parameters, the 
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semiparametric procedure suggested in Reisen et al. (2006a, b) is considered under a non- 
constant conditional error variance. The memory estimates evidenced that the series 
is stationary with long-memory property at zero and seasonal frequencies. This is an 
interesting feature observed in the data which support the use of a more sophisticated 
model structure. Another equally interesting characteristic observed is that the conditional 
variance of the error is correlated. The features seasonality, long-memory and volatility 
of the data were well captured by the model proposed in this paper, that is, by the 
SARFIMA(0,d,l) x (0, D, 1) 7 -GARCH(1, 1) model. The residual analysis and one-step 
ahead forecast indicated that the SARFIMA-GARCH model presented a very accurate 
model adequacy. 
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Appendix 

Proof of Proposition 1. (X t ) can be seen as as a fractional ARIMA process introduced 
by Giraitis & Leipus (1995) with garch-errors. Let Y t = §f§^(l - B s ) D X t . Then Y t 
follows an ARFIMA(p, d, q)-GARCH(r, m) model according to Ling & Li (1997). Under 
the assumptions the power expansions series §fey(l — z s )~ D and gfej(l — z s ) D converge 
for \z\ < 1. Then based on Theorems 2 and 2.3 in Giraitis & Leipus (1995) and Ling & Li 
(1997), respectively, the statements (a) and (b) are straightforward obtained. 

□ 
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